# Truly Optimal Inverse Propensity Scoring for Off-Policy Evaluation with Multiple Loggers

## Overview
This repository contains code for replicating the experiments from the paper:  
**"Truly Optimal Inverse Propensity Scoring for Off-Policy Evaluation with Multiple Loggers"** (under submission).

## Running Docker container
```bash
# Build the Docker image
docker build -t oips-minimal .
# Run the container
docker run --gpus all --rm -it --entrypoint bash -v $(pwd):/app oips-minimal
```

## Running the Code

First, run `prepare_experiment.py` to generate action data from the target evaluation policy.  
For example, to generate 200 independent action datasets on the `optdigits` dataset (stored under `data/optdigits`), with 70% used for OPE:

```bash
python prepare_experiment.py -d optdigits -n 200 -ope 0.7
```

Experiment configurations are in the `configs` folder. Each `.jsonl` file specifies a set of experiments (jobs), where each line corresponds to a single experiment. To run experiments defined in `job_name.jsonl`:

```bash
python run_experiment.py -c job_name.jsonl
```

Relative RMSE benchmark results will be saved under `log/job_name`.

## Reproducing Paper Results

To replicate the results reported in our paper, run the following sequentially:

```bash
python prepare_experiment.py -d letter -n 200 -ope 0.7
python prepare_experiment.py -d optdigits -n 200 -ope 0.7
python prepare_experiment.py -d pendigits -n 200 -ope 0.7
python prepare_experiment.py -d sat -n 200 -ope 0.7

python run_experiment.py -c 2_loggers_alpha_0.95.jsonl
python run_experiment.py -c 2_loggers_alpha_0.80.jsonl
python run_experiment.py -c 2_loggers_alpha_0.65.jsonl
python run_experiment.py -c 3_loggers.jsonl
python run_experiment.py -c 5_loggers.jsonl
```

## Plotting the results
Use these helper scripts to visualize the saved Relative-RMSE CSVs in `log/<job_name>`:

- Plot for 2 loggers (four-panel lines vs stratum ratio):
  - Runs on a directory like `log/2_loggers_alpha_0.95` and writes `known.png` and `estimated.png` in the same folder.
  - Usage:
    ```bash
    python plot_rel_rmse_2_loggers.py --log_dir log/2_loggers_alpha_0.95
    ```

- Plot for 3+ loggers (per-dataset bar charts):
  - Reads a directory like `log/3_loggers` and saves files such as `3-letter-known.png`, `3-letter-estimated.png`, etc.
  - Usage:
    ```bash
    # Basic
    python plot_rel_rmse_multi_loggers.py --log_dir log/3_loggers

    # Customizations
    python plot_rel_rmse_multi_loggers.py \
      --log_dir log/5_loggers \
      --file_glob "rel_rmse__*.csv" \
      --prefix 5 \
      --format png \
      --dpi 200
    ```